0 Introduction

0.1 Goal

0.2 Data

0.3 Recap of Visualization and Data Analysis

0.4 Summary

0.5 In This Notebook

0.6 General Setup

Setup is the same in all sections unless specified


1 Preparing The Data for Regression

1.1 Cleaning

1.2 Filtering the data

1.2 Constructing regression dataframe

2 Feature Selection

2.0 Theory

Methods

Evaluation metrics

Notes


2.1 Method 1: highly correlated features

<caption> <caption>

Tune number of highly correlated auto-regressors

<caption>

Calculating error measures for the selected number of features


2.2 Method 2: RFE+CV

2.3 Comapring Feature Selection Methods

3 Visualizing Predictions

Goal: Produce plots to evaluate the performance of the linear model

3.1 Residual plots: Plot residual (predicted-actual) vs. predicted

Ideal:

Observations:

Interpretation:

Consequences of heteroscedasticity:

ref: https://www.qualtrics.com/support/stats-iq/analyses/regression-guides/interpreting-residual-plots-improve-regression/

3.2 Predictions in different months

Observations:

3.3 Actual vs. Prediction

Observations:

4 Statistical Analysis of Ordinary Least Squares with statsmodels.api

4.0 Theory:

Evaluation measures:

Interpreting the coefficients:

More statistics:

Ref: https://www.accelebrate.com/blog/interpreting-results-from-linear-regression-is-the-data-appropriate

Package: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.RegressionResults.html

Significance of regressors:

Above results show that all features weren't significant. We remove these features and train another model on the reduced features set.

Result: AIC improves but other measures do not change significantly.

5 Summary

Overview:

What has been done so far:

Results:

To do next: Improving the fit

Robustenss analysis:

Federation: